Conventional and genetic risk factors for chronic Hepatitis B virus infection in a community-based study of 0.5 million Chinese adults

Despite universal vaccination of newborns, the prevalence of chronic hepatitis virus B (HBV) infection and the associated disease burden remain high among adults in China. We investigated risk factors for chronic HBV infection in a community-based study of 512,726 individuals aged 30–79 years recruited from ten diverse areas during 2004–2008. Multivariable logistic regression was used to estimate odds ratios (ORs) of hepatitis B surface antigen (HBsAg) positivity recorded at baseline by sociodemographic and lifestyle factors, and medical history. In a random subset (n = 69,898) we further assessed the association of 18 single nucleotide polymorphisms (SNPs) previously shown to be associated with HBsAg positivity and development of chronic liver disease (CLD) (1600 cases). Several factors showed strong associations with HBsAg positivity, particularly younger age (< 40 vs. ≥ 60 years: OR 1.48, 95% CI 1.32–1.66), male sex (1.40, 1.34–1.46) and urban residency (1.55, 1.47–1.62). Of the 18 SNPs selected, 17 were associated with HBsAg positivity, and 14 with CLD, with SNPs near HLA-DPB1 were most strongly associated with both outcomes. In Chinese adults a range of genetic and non-genetic factors were associated with chronic HBV infection and CLD, which can inform targeted screening to help prevent disease progression.

1-29 years 9 . Furthermore, in addition to conventional risk factors, genome wide association studies (GWAS) have found several genetic variants associated with chronic HBV. Most variants are human leukocyte antigen (HLA) loci, which play a critical role in the host immune response to viral infection through antigen presentation 10 , where polymorphisms can alter the efficacy of antigen binding and T-cell response, impacting viral clearance 11 . Other genes (including some in non-HLA regions of the genome) also impact likelihood of viral persistence or clearance by altering the magnitude of adaptive or innate immune responses 11 . Existing GWAS were relatively small case-control studies in constrained geographical areas, recruiting cases from hospitals or liver cancer screening units [12][13][14][15][16] , and further research examining how these genetic variants are related to chronic HBV risk in a large, geographically diverse population sample is of interest.
Further knowledge about risk factors associated with HBV chronicity in adults may help ongoing efforts to reduce the chronic HBV burden in China, by informing targeted testing of higher risk individuals to capture infected individuals on the chronic HBV care continuum, to receive appropriate treatment and care 17,18 . We used a large community-based cohort study of middle-aged adults from ten geographically diverse sites in China to assess both genetic and non-genetic risk factors associated with chronic HBV.

Methods
Study population. The China Kadoorie Biobank (CKB) study design has been described in detail elsewhere 19 . A baseline survey was conducted in 2004-2008 among 512,726 men and women, aged 30-79 years, recruited from five urban and five rural geographically diverse areas in China. Potentially eligible participants were identified through official residential records in each of 100-150 administrative units (rural villages and urban residential committees) within each region. Trained health workers administered laptop-based questionnaires at local study clinics collecting information on sociodemographic and lifestyle factors (e.g. smoking, alcohol consumption, diet, physical activity) and medical history (e.g. history of blood transfusion, self-reported health and medical conditions diagnosed by a doctor including whether they had a history of chronic hepatitis or cirrhosis). Blood pressure, lung function and anthropometric measures were measured using standard protocols, and a non-fasting venous blood sample was collected at baseline for on-site tests and long-term storage. Resurveys following similar procedures were conducted in 2008 and 2013-2014 among a subset (4-5%) of surviving participants. Vital status of participants was determined periodically through national death registries, and episodes of hospitalization were collected via linkage to disease registries and national health insurance claims database, which has almost universal coverage in study areas. International Classification of Diseases, 10 th Revision (ICD-10) were used to code disease events. Prior international, national and regional ethics approvals were obtained and all participants provided written informed consent.

Measurement of chronic HBV infection.
Hepatitis B surface antigen (HBsAg) was measured in all participants at the baseline visit, using a point of care, lateral flow rapid diagnostic test (RDT), where participants' venous whole blood was applied to an on-site rapid test strip (ACON dipstick). Results were recorded as positive, negative, or unclear. HBV antibodies to hepatitis B core antigen (anti-HBc) and hepatitis B e antigen (anti-HBe) were additionally measured in stored plasma samples from a randomly selected subcohort of 2000 participants who were alive and cancer free after two years of follow-up, using a Luminex-based multiplex serology panel 20 . Genotyping and genetic variant selection. The present study used a subset of 75,982 genotyped samples using a custom-designed 800K-SNP array (Axiom; Affymetrix). This sample was approximately representative of the overall CKB cohort, where selection was by box of DNA samples, prioritising individuals from second resurvey study clinics (which were representative of the cohort) or at random from other recruitment sites. After exclusion of 4945 individuals who were regional population outliers based on genomic principal components analysis within regions, there were 71,037 people in the genetic analyses of chronic liver disease (CLD), and after further exclusion of 1139 participants with missing HBsAg data, there were 69,898 participants included in the main genetic analyses for HBsAg positivity ( Supplementary Fig. 1). Replication of genetic associations were performed on 18 SNPs previously found to be associated with chronic HBV infection in prior GWAS. These were identified by searching the US National Human Genome Research Institute Catalog of Published GWAS 21 for "hepatitis B virus infection trait" (Trait: EFO_0004197, searched August 2020) and limiting findings to GWAS reporting SNPs associated with persistent HBV infection, or susceptibility to HBV infection reported, and where the finding had been replicated (either in the relevant study or subsequent GWAS). Studies 10,[12][13][14]16,[22][23][24]  Disease outcomes in genetic analyses. CLD was defined as participants with either prevalent or incident liver disease. Prevalent liver disease included participants reporting either cirrhosis/chronic hepatitis or liver cancer diagnosed by a doctor in the baseline survey. Incident liver disease was captured by electronic health record linkage described above, including cirrhosis [ICD10: K70, K74], hepatic failure [K72] or liver cancer [C22]). A total of 1600 people (2.3%) had CLD at baseline or occurring during the follow-up out of 71,037 participants in the GWAS randomly selected sample ( Supplementary Fig. 1).
Statistical analysis. Individuals with missing body mass index (BMI) (n = 2) or missing/unclear HBsAg (n = 11,733) data were excluded, leaving 500,991 participants for the main analysis. Prevalence estimates were generated for HBsAg and HBV antibodies by baseline characteristics, standardized by age (10-year categories) and study site among men and women separately. Logistic regression models were used to estimate odds ratios (OR) and 95% confidence intervals (CI) for HBsAg positivity associated with a range of baseline characteristics, including demographic, socioeconomic, behavioural and medical risk factors. In the basic model, we adjusted www.nature.com/scientificreports/ for age (5-year categories), sex and study site (10 areas). A forward selection method was then used to determine which factors were included in the multivariable model, where the likelihood ratio test (LRT) was used to compare the basic model with sequential addition of risk factors and those that significantly improved model fit were retained in the model (Supplementary: Table 3, 4 and Methods S1 for details). Given the high proportion of participants that consumed regular vegetables and were currently married (all > 90%), these factors were not included in multivariable analysis. Tests of trend were performed across ordered categorical variables. Genetic associations were analysed using an additive model for individual SNPs. SNPs were orientated so that the risk allele was the allele associated with HBsAg positivity in existing GWAS. Logistic regression models for HBsAg status (positive, negative) and chronic liver disease (CLD, yes vs. no) were fitted, stratified by ten study sites and adjusted for age (10-year categories), sex and up to ten region specific principal components. Inverseweighted fixed effect meta-analyses were used to calculate overall estimates and 95% CI. Additional analyses were conducted to investigate risk of progression to CLD among HBsAg positive participants, and, in the subset of 2000 people with HBV antibody data, the risk of HBsAg positivity among those exposed to chronic HBV, as measured by anti-HBc status).
All analyses used R v4.0.2 and PLINK v2.0 25 . We considered two-tailed p < 0.05 as evidence of an association. To account for multiple testing in the genetic analyses we applied a Bonferroni correction to the significance level, dividing 0.05 by 18 SNPs tested (i.e., 0.003).
Role of the funding source. The funders of the study had no role in study design, collection, analysis or interpretation of data, or in the writing of the report.

Results
Baseline characteristics and prevalence of chronic HBV infection. Among the 500,991 participants included, the mean (SD) age was 52.1 (10.7) years, 41.0% were men, 55.4% lived in a rural area, 18.2% had attained primary or middle school education. The overall HBsAg prevalence was 3.0% (n = 15,552), which was higher in men (3.4%) than in women (2.8%). HBsAg prevalence decreased with age particularly in men (Fig. 1A,B), and varied across study areas (Fig. 1C,D), with the highest prevalence in Southern Haikou (women: 4.8%; men: 6.4%) and lowest in Western Gansu (women: 1.8%; men: 1.9%). Overall, urban sites had a higher prevalence than rural sites among women (3.3% vs. 2.4%) and among men (4.0% vs. 3.1%;). HBsAg prevalence was higher in those less educated, agricultural workers and those with lower household income, while prevalence decreased across increasing number of years with a household fridge ( Supplementary Fig. 2). Of HBsAg positive participants, 11.3% reported chronic hepatitis or cirrhosis at baseline, of whom 18.9% were on current treatment (Table 1). In the subcohort with HBV antibodies measured, overall seroprevalence was 45.0% and 44.8% for anti-HBc and anti-HBe respectively, where seropositivity for both antibodies were higher in males than females, and increased with older age (Supplementary Table 5). Table 2 shows relationships between a range of conventional risk factors measured at baseline with HBsAg positivity. Compared to their counterparts, participants who were of younger age, male, resident in urban sites, underweight, with no formal education, with a history of blood transfusion or with poor self-reported health status at baseline had higher HBsAg positivity; while the converse was true for people with higher household income, occasional alcohol intake, longer use of a household fridge and who were overweight (all p < 0.001). Of these, the strongest associations were seen with age (< 40 vs. ≥ 60 years old: OR   Table 6). Among the 18 SNPs studied, the risk allele frequency (RAF) varied by study site (Supplementary Table 7), with up to a twofold difference for certain SNPs (e.g. rs652888 G allele RAF 0.16 in Henan and 0.31 in Liuzhou). Overall 17 SNPs were associated with higher odds of HBsAg positivity, with 13 passing the significance threshold after multiple-testing adjustment (  Table 8). In analyses of anti-HBc positive participants (n = 769 from the subset of 2000), only rs7453920 showed an association with HBsAg positivity (n = 49; 3.22, 1.35-9.62) (Supplementary Table 9).

Discussion
This large nationwide study of Chinese adults presents findings on both non-genetic and genetic risk factors associated with chronic HBV infection. While HBsAg prevalence was 3% in the overall cohort, this varied greatly by study site, with younger age, male sex, socioeconomic factors, alcohol intake and BMI strongly correlated with HBsAg positivity. We also replicated findings of existing GWAS for genetic variants previously associated with Figure 1. Prevalence of Hepatitis B surface antigen by age, sex and study area. Hepatitis B surface antigen (HBsAg) prevalences displayed (95% CI) are standardized by age (10-year categories) and study site (ten sites) among men and women separately, stratified by urban or rural location. (a) HBsAg prevalence in men by age category, (b) HBsAg prevalence in women by age category, (c) HBsAg prevalence in men by study site, (d) HBsAg prevalence in women by study site. HBsAg, hepatitis B surface antigen. www.nature.com/scientificreports/ chronic HBV, and showed that several of these were additionally associated with risk of CLD. This is the largest population-based study in China to assess both non-genetic and genetic risk factors associated with chronic hepatitis B infection, highlighting the role that numerous risk factors may play in chronic hepatitis B infection. Several large nationwide surveys in China have previously reported regional variation in HBsAg prevalence. Historically rates of chronic HBV have been higher in rural, western regions of China, but with widespread urbanization and mass migration of rural workers to large coastal cities and eastern provinces, these patterns have been shifting 26 . A nationally representative serosurvey conducted in 2006 27 of ≈ 41,000 people aged 1-59 years recruited from 31 provinces measured HBsAg in serum blood samples using ELISA, reported higher HBsAg prevalence in western (8.3%) and rural (7.3%), compared to eastern (6.5%) and urban (6.8%) areas. However, there was large variation within these broad geographical regions-for example in Western China, HBsAg prevalence was 11.6% 28 and 3.9% 29 in Sichuan and Gansu province respectively. A more recent study in 2 million men aged 21-49 years in rural China enrolled in the National Free Preconception Health Examination project (NFPHEP) between 2010 and 2012 reported HBsAg prevalence of 7.7%, 5.5% and 6.5% in Eastern, Central and Western China respectively 30 , while other large population based cross-sectional studies, mostly conducted in Eastern China, have reported higher HBsAg prevalence among areas of lower socioeconomic status 31 , coastal areas 32 and areas containing a higher proportion of immigrants 33 . In CKB, we also found large geographic variation in HBsAg prevalence, where study sites in southern and eastern China had higher HBsAg prevalence than western and north-eastern sites, and urban sites tended to have higher prevalence than rural sites. This regional variation in HBsAg prevalence highlights both the need to draw on populations from diverse areas of China, where the relative importance of correlates with chronic HBV may vary, and that pooled estimates across large regions may obscure important intra-region differences in HBsAg prevalence. The lower prevalence of HBsAg Table 1. Baseline characteristics of overall cohort and by Hepatitis B surface antigen status. HBsAg, hepatitis B surface antigen; MET, metabolic equivalent of task; SBP, systolic blood pressure; BMI, body mass index; CHD, coronary heart disease; TIA, transient ischaemic attack. a Married: Participants who reported currently being married, b Smoking: Current smoker includes those reportingoccasional or current smoking, c Alcohol: Ever-regular alcohol intake includes participants reporting monthly, reduced intake, weekly or ex-regular alcohol intake, d Dietary factors: regular includes participants report intake 4 or more times per week.  Table 2. Odds ratios and 95% CI for baseline factors by Hepatitis B surface antigen status. BMI, body mass index; MET, metabolic equivalent of task; NS, not significant. a Adjusted for age in 5-year categories, sex and study site (ten sites) where possible, b Adjusted for age in 5-year categories, sex and study site (ten sites), birth cohort, education, occupation, household income, number of years with a household fridge, alcohol intake, history of blood transfusion, body mass index and self-rated health where possible, c p trend presented for ordered categorical variables, d Other occupation includes house-wife/husband, self-employed, unemployed, other or not-stated, e Smoking: Current smoker includes those reporting occasional or current smoking. f Alcohol: everregular includes participants reporting monthly, reduced intake, weekly or ex-regular alcohol intake. g Regular fruit intake includes participants report intake 4 or more times per week. www.nature.com/scientificreports/ positivity in CKB compared to the 2006 National serosurvey 27 which reported an overall HBsAg prevalence of 7.2% (30-60 years: 8.6%) may reflect this regional variation in HBsAg prevalence, in addition to the CKB cohort including adults aged over 59 years (with lower HBsAg prevalence), and the HBsAg test used in CKB having lower sensitivity than ELISA 34 . The trend of HBsAg positivity in relation to age has shifted over recent decades; as the proportion of vaccinated younger adults increases, HBsAg prevalence peaks at older ages. For example, the 2006 National serosurvey 27 reported peak HBsAg prevalence in 20-29 years olds (10.5%), while another large cross-sectional study of ≈ 87,000 adults recruited in 2009-2010 in Eastern China reported HBsAg peaked in participants aged 35-40 years 31 at 11.6%, and a third population based study in 2013 in Western China reported peak prevalence in 53-57 year olds at 10.5% 35 . Prevalence tends to decrease with age beyond this peak, as more of the population undergoes HBsAg seroclearance, and a proportion of the infected individuals are diagnosed and treated, or die from liver related disease. The inverse association observed between older age and HBsAg prevalence is consistent with this, as participants in CKB are from the pre-vaccine era and thus largely unvaccinated.
The higher levels of HBsAg positivity among men compared with women has been described in past studies, including an absolute difference of 3% in the 2006 National serosurvey 27 (8.6% men; 5.7% women) and up to a twofold relative difference in odds of HBsAg positivity in other large population based studies [36][37][38] . This is similar to our finding of a 1.4 fold greater risk in HBsAg positivity in men than in women. This sex disparity is hypothesized to be related to a differential HBV-related immune response where immune clearance of serum HBsAg is achieved in a higher proportion of women than men, in addition to women gaining better protection from HBV vaccination 39 .
Past studies in China have also reported on the association between education level and HBsAg positivity 27,36,[40][41][42] , with most showing an inverse association, consistent with our findings. Findings for occupation have been mixed, although agricultural work has been associated with HBsAg positivity in several past studies 27,40,43 , consistent with the higher prevalence of HBsAg positivity in agricultural workers in our study, which may reflect geographic variation and socioeconomic status. Two past studies in Henan 37 and Jilin 44 also reported no association between smoking and HBsAg positivity, while few studies have examined the association between self-rated health, alcohol intake or BMI and HBsAg positivity. Self-rated health is likely a marker of socioeconomic status, consistent with higher HBsAg prevalence among participants with lower levels of education described in past studies. Two existing studies reporting the association between HBsAg and BMI had conflicting findings-one population based study of ≈ 400,000 adults in Sichuan province found participants with BMI ≥ 25 kg/m 2 were significantly more likely to be HBsAg positive compared to normal weight (BMI 18.5-25 kg/ m 2 ) counterparts (OR 1.08, 95% CI 1.05-1.11) 44 ; while the other reported 45 in ≈ 3500 adults in Shanghai, an inverse association with odds of HBsAg positivity and BMI, where participants with BMI ≥ 28 kg/m 2 were 49% (95% CI 6-72%) less likely to be HBsAg positive than participants of normal weight. The association we observed between HBsAg positivity and BMI is consistent with this latter study, and may reflect socioeconomic status or reverse causation, whereby participants with chronic HBV may have lost weight in the course of their illness. Furthermore, a U-shaped association between BMI and cirrhosis in CKB has been previously described 46 . Two past studies on Chinese adults in conducted in Sichuan and Guangdong province, reported lower risk of HBsAg positivity among occasional or low to moderate alcohol drinkers compared to never drinkers 35,47 , while another study 32 conducted in Zhejiang province found that any drinking was associated with a 30% (27-34%) higher risk of HBsAg positivity compared to no drinking. The apparent protective association between alcohol intake and HBsAg positivity in our study may reflect altered behaviour related to alcohol intake among known HBV positive people or people with CLD, for whom abstaining from alcohol may be recommended.
Since the first GWAS on chronic HBV was conducted in 2009, the number of SNPs significantly associated with chronic HBV has expanded from SNPs at HLA class II loci, to include those at HLA class I loci, non-classical HLA SNPs and non-HLA SNPs. Most previous GWAS were based on diagnosed clinical conditions such as CLD or liver cancer 10,[12][13][14]16,22 , meaning that participants with different HBV phenotypes such as less severe disease, or chronic HBV without progression to liver disease, may be under-represented. Furthermore, although most GWAS have been performed in participants of East-Asian ancestry, several used populations from particular geographic areas, and tended to be modest in sample size, ranging from between ≈ 4000 16 to ≈ 15,000 23 people. Our study included > 65,000 participants and replicated the associations of 17 SNPs with HBsAg positivity. We did not replicate rs7000921 (INTS10) previously reported in a Chinese ancestry case-control study of ≈ 9500 people 24 . However, the phenotype examined in that study was HBsAg positivity among anti-HBc positive individuals, which we had limited power to explore due to the small size of the sub-cohort with anti-HBc data.
Existing evidence suggests there is little overlap between SNPs associated with HBsAg positivity and those associated with progression to HBV-related liver disease, where a systematic review of published SNPs associated with different HBV phenotypes found that the overlap occurred between SNPs associated with HBV positivity and HBV vaccine response, rather than with disease progression 11 . Most past GWAS on disease progression have reported HCC progression among HBsAg positive participants. These differences in population and phenotype reported in past GWAS may explain our finding of 14 of 18 SNPs being associated with CLD: we examined CLD more broadly among all participants regardless of HBsAg status, with limited power to investigate progression to CLD among HBsAg positive participants.
The strengths of this study include its large size from diverse geographic areas, both middle-and older-aged population and breadth of information that enabled investigation of a wide range of both conventional and genetic factors associated with HBsAg positivity. To date risk factors associated with chronic HBV have been focused on factors related to mother to child transmission and age of infection; while evidence around associations of socioeconomic, behavioural and medical factors with chronic HBV among adults is lacking. Given that the key burden of chronic HBV related disease occurs in middle-aged and older adults, and the low diagnosis rate of chronic HBV in China, our study findings help fill the evidence gap. However, our study also has several www.nature.com/scientificreports/ limitations. First, the RDT HBsAg test has lower sensitivity than laboratory-based tests such as ELISA used in most existing smaller studies 34 , leading to a likely underestimation of HBsAg prevalence, which may be more pronounced in those with lower viral load, such as older participants. Second, due to lack of other hepatitis data (e.g. anti-HBc, e-antigen) in the whole cohort, we were only able to compare HBsAg positive to HBsAg negative individuals. We therefore were unable to detect individuals with occult infection, or investigate other phenotypes of interest such as HBsAg viral clearance. Although this approach is consistent with the approach taken by past GWAS 12,14,22 , several others were able to investigate HBsAg clearance among a cohort of exposed participants 13,23,24 . Third, we investigated SNPs identified in previous GWAS in different populations, mainly from the HLA region, but did not explore further the likely multiple independent effects from various SNPs in this part of the genome, which may vary among different populations. However, our work nonetheless adds to the evidence regarding the likely association between HLA variance and HBsAg positivity in Chinese populations. Four, we did not have information on other relevant risk factors, including drug use, number of sexual partners and vaccination status, in addition to information on viral subtype, which is an important source of disease heterogeneity. Finally, this is a cross-sectional study investigating associations between a range of non-genetic and genetic factors with prevalent chronic HBV, which does not capture risk of incident infection. In summary, this study adds to the current knowledge of factors associated with HBV chronicity in adults, which may help to inform targeted HBsAg screening, enabling improved diagnosis and capturing of individuals on the HBV care continuum. Future research combining conventional and genetic risk factors, including viral genotypes, could further improve knowledge about the risk of HBV chronicity and disease progression.

Data availability
The China Kadoorie Biobank (CKB) is a global resource for the investigation of lifestyle, environmental, blood biochemical and genetic factors as determinants of common diseases. The CKB study group is committed to making the cohort data available to the scientific community in China, the UK and worldwide to advance knowledge about the causes, prevention and treatment of disease. For detailed information on what data is currently available to open access users and how to apply for it, visit: http:// www. ckbio bank. org/ site/ Data+ Access. Researchers who are interested in obtaining the raw data from the China Kadoorie Biobank study that underlines this paper should contact ckbaccess@ndph.ox.ac.uk. A research proposal will be requested to ensure that any analysis is performed by bona fide researchers and-where data is not currently available to open access researchers-is restricted to the topic covered in this paper. www.nature.com/scientificreports/